Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have tidy ensure that we document all unsafe blocks in libcore #63793

Merged
merged 5 commits into from
Nov 8, 2019

Conversation

oli-obk
Copy link
Contributor

@oli-obk oli-obk commented Aug 21, 2019

cc @rust-lang/libs

I documented a few and added ignore flags on the other files. We can incrementally document the files, but won't regress any files this way.

@rust-highfive
Copy link
Collaborator

r? @KodrAus

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 21, 2019
@rust-highfive
Copy link
Collaborator

The job mingw-check of your PR failed (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2019-08-21T18:12:19.8607065Z ##[command]git remote add origin https://github.com/rust-lang/rust
2019-08-21T18:12:19.8805627Z ##[command]git config gc.auto 0
2019-08-21T18:12:19.8889655Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2019-08-21T18:12:19.8936506Z ##[command]git config --get-all http.proxy
2019-08-21T18:12:19.9079955Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/63793/merge:refs/remotes/pull/63793/merge
---
2019-08-21T18:12:56.4261839Z do so (now or later) by using -b with the checkout command again. Example:
2019-08-21T18:12:56.4264361Z 
2019-08-21T18:12:56.4264801Z   git checkout -b <new-branch-name>
2019-08-21T18:12:56.4264836Z 
2019-08-21T18:12:56.4264915Z HEAD is now at 2dca2ba86 Merge 6fb59fa17c87baab781792ee5db4a9db834cb92f into 7b0085a613e69cb69fc9e4eb5d422fa4a39d5de1
2019-08-21T18:12:56.4450033Z ##[section]Starting: Collect CPU-usage statistics in the background
2019-08-21T18:12:56.4453709Z ==============================================================================
2019-08-21T18:12:56.4453796Z Task         : Bash
2019-08-21T18:12:56.4453848Z Description  : Run a Bash script on macOS, Linux, or Windows
---
2019-08-21T18:18:21.6178904Z     Checking core v0.0.0 (/checkout/src/libcore)
2019-08-21T18:18:21.7454577Z error: attributes are not yet allowed on `if` expressions
2019-08-21T18:18:21.7455006Z    --> src/libcore/iter/adapters/mod.rs:527:13
2019-08-21T18:18:21.7455359Z     |
2019-08-21T18:18:21.7455774Z 527 |             #[cfg(boostrap_stdarch_ignore_this)]
2019-08-21T18:18:21.7456216Z 
2019-08-21T18:18:21.7532903Z error: aborting due to previous error
2019-08-21T18:18:21.7532989Z 
2019-08-21T18:18:21.7553992Z error: Could not compile `core`.
2019-08-21T18:18:21.7553992Z error: Could not compile `core`.
2019-08-21T18:18:21.7554389Z warning: build failed, waiting for other jobs to finish...
2019-08-21T18:18:26.8122596Z error: build failed
2019-08-21T18:18:26.8178221Z command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "check" "--target" "x86_64-unknown-linux-gnu" "-Zbinary-dep-depinfo" "-j" "2" "--release" "--color" "always" "--features" "panic-unwind backtrace compiler-builtins-c" "--manifest-path" "/checkout/src/libstd/Cargo.toml" "--message-format" "json-render-diagnostics"
2019-08-21T18:18:26.8185797Z failed to run: /checkout/obj/build/bootstrap/debug/bootstrap check
2019-08-21T18:18:26.8185904Z Build completed unsuccessfully in 0:02:34
2019-08-21T18:18:26.8235919Z == clock drift check ==
2019-08-21T18:18:26.8253643Z   local time: Wed Aug 21 18:18:26 UTC 2019
2019-08-21T18:18:26.8253643Z   local time: Wed Aug 21 18:18:26 UTC 2019
2019-08-21T18:18:26.9816313Z   network time: Wed, 21 Aug 2019 18:18:26 GMT
2019-08-21T18:18:26.9819441Z == end clock drift check ==
2019-08-21T18:18:40.0060021Z ##[error]Bash exited with code '1'.
2019-08-21T18:18:40.0097444Z ##[section]Starting: Checkout
2019-08-21T18:18:40.0099262Z ==============================================================================
2019-08-21T18:18:40.0099348Z Task         : Get sources
2019-08-21T18:18:40.0099401Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

src/libcore/benches/ascii.rs Outdated Show resolved Hide resolved
src/libcore/num/f32.rs Outdated Show resolved Hide resolved
src/libcore/num/f64.rs Outdated Show resolved Hide resolved
@fbstj
Copy link
Contributor

fbstj commented Aug 22, 2019

are the six now-safe intrinsics (size_of_val, min_align_of_val, discriminant_value, type_id, likely, unlikely) blocked on different PRs?

@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 22, 2019

are the six now-safe intrinsics (size_of_val, min_align_of_val, discriminant_value, type_id, likely, unlikely) blocked on different PRs?

No, I just did them here because otherwise I'd have had to add // SAFETY: this intrinsic is safe to call, it's just unsafe because noone made it safe

@rust-highfive
Copy link
Collaborator

The job mingw-check of your PR failed (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2019-08-22T14:50:56.1683604Z ##[command]git remote add origin https://github.com/rust-lang/rust
2019-08-22T14:50:56.1868101Z ##[command]git config gc.auto 0
2019-08-22T14:50:56.1938279Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2019-08-22T14:50:56.1999724Z ##[command]git config --get-all http.proxy
2019-08-22T14:50:56.2143890Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/63793/merge:refs/remotes/pull/63793/merge
2019-08-22T14:50:58.3345795Z remote:                                                                                         
---
2019-08-22T14:51:32.1438609Z do so (now or later) by using -b with the checkout command again. Example:
2019-08-22T14:51:32.1438913Z 
2019-08-22T14:51:32.1439431Z   git checkout -b <new-branch-name>
2019-08-22T14:51:32.1439741Z 
2019-08-22T14:51:32.1440477Z HEAD is now at b3f7bc1b4 Merge c2d541f186a249260f09a32a3aef31f190804cd0 into 201e52e5fe73ccf3dd22946b1216ad8d64f8c2ba
2019-08-22T14:51:32.1605694Z ##[section]Starting: Collect CPU-usage statistics in the background
2019-08-22T14:51:32.1609211Z ==============================================================================
2019-08-22T14:51:32.1609273Z Task         : Bash
2019-08-22T14:51:32.1609373Z Description  : Run a Bash script on macOS, Linux, or Windows
---
2019-08-22T14:56:48.4599632Z     Checking core v0.0.0 (/checkout/src/libcore)
2019-08-22T14:56:48.6125697Z error: attributes are not yet allowed on `if` expressions
2019-08-22T14:56:48.6126113Z    --> src/libcore/iter/adapters/mod.rs:527:13
2019-08-22T14:56:48.6126457Z     |
2019-08-22T14:56:48.6126798Z 527 |             #[cfg(boostrap_stdarch_ignore_this)]
2019-08-22T14:56:48.6127216Z 
2019-08-22T14:56:48.6225837Z error: aborting due to previous error
2019-08-22T14:56:48.6225948Z 
2019-08-22T14:56:48.6281427Z error: Could not compile `core`.
2019-08-22T14:56:48.6281427Z error: Could not compile `core`.
2019-08-22T14:56:48.6281804Z warning: build failed, waiting for other jobs to finish...
2019-08-22T14:56:53.4666642Z error: build failed
2019-08-22T14:56:53.4698646Z command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "check" "--target" "x86_64-unknown-linux-gnu" "-Zbinary-dep-depinfo" "-j" "2" "--release" "--color" "always" "--features" "panic-unwind backtrace compiler-builtins-c" "--manifest-path" "/checkout/src/libstd/Cargo.toml" "--message-format" "json-render-diagnostics"
2019-08-22T14:56:53.4710292Z failed to run: /checkout/obj/build/bootstrap/debug/bootstrap check
2019-08-22T14:56:53.4710554Z Build completed unsuccessfully in 0:02:31
2019-08-22T14:56:53.4763277Z == clock drift check ==
2019-08-22T14:56:53.4778337Z   local time: Thu Aug 22 14:56:53 UTC 2019
2019-08-22T14:56:53.4778337Z   local time: Thu Aug 22 14:56:53 UTC 2019
2019-08-22T14:56:53.6399129Z   network time: Thu, 22 Aug 2019 14:56:53 GMT
2019-08-22T14:56:53.6402864Z == end clock drift check ==
2019-08-22T14:57:06.7146623Z ##[error]Bash exited with code '1'.
2019-08-22T14:57:06.7195018Z ##[section]Starting: Checkout
2019-08-22T14:57:06.7196811Z ==============================================================================
2019-08-22T14:57:06.7196870Z Task         : Get sources
2019-08-22T14:57:06.7196922Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@alexcrichton
Copy link
Member

I'm not personally a huge fan of style lints like this in the sense that it just forces unhelpful boilerplate documentation to be written in many circumstances. Most of the safety comments I'm seeing sort of just trivially reference the line above it so I'm not really sure how a comment actually bolsters understanding of the function because it's all being read anyway. There's also other comments like "SAFETY: transmuting a sequence of u8 to u64 is always fine" which I feel like taken in isolation isn't actually right, in theory that requires some form of alignment checks as well but in reality the function being called here is doing all that internally (it's just the comment that's a bit misleading).

Overall I think I would prefer to not have a lint that guarantees all unsafe blocks are documented, but I'm certainly not the only member of the libs team!

@JohnCSimon
Copy link
Member

Ping from triage:
@oli-obk @fbstj @KodrAus
Hi! This has sat idle for over two weeks. What else needs to happen for this pr to move onwards?

Thank you!

@Centril
Copy link
Contributor

Centril commented Sep 8, 2019

It's no secret I think this PR is sorely needed.

Most of the safety comments I'm seeing sort of just trivially reference the line above it so I'm not really sure how a comment actually bolsters understanding of the function because it's all being read anyway.

I don't think each and every safety comment needs to be super helpful and if there are trivially true cases then those can just say "trivially safe". Often however, it's not as trivially true as one experienced reviewer, e.g., @alexcrichton, who often looks at unsafe blocks might think. I think a better question is: are these actively harmful?

which I feel like taken in isolation isn't actually right, in theory that requires some form of alignment checks as well but in reality the function being called here is doing all that internally (it's just the comment that's a bit misleading).

This suggests that it's in fact not as trivial as one might think and that more elaboration is needed rather than not having a comment at all.

It's also not the trivial cases that we wish to catch with these comments but it's also not possible to distinguish them without far more complicated semantic analysis (proofs in e.g. Coq). I think the cost of a bit of boilerplate in trivial cases is well worth it if we get to have commentary on more complicated cases. One might say that this problem should be solved with good reviews. To me however, it is clear that reviews have not been satisfactory thus far.

Also, from a language team perspective, I find it really important to know what assumptions the standard library makes on our operational semantics.

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Sep 10, 2019
@alexcrichton
Copy link
Member

@rfcbot fcp close

I'm going to propse that we close this. Other members of the libs team can of course disagree! I explained some reasoning above.

@Centril I personally think that annoying lints can be actively harmful because they encourage the wrong coding patterns and give a false sense of security or correctness. An unsafe block is no more correct if a comment is indicated as to why it's ok, nor is it even any more correct if the comment is quite long and tries to explain everything. Fundamentally unsafe blocks will be mistakes eventually and while we can try to head it off I see this as generating too much work for not enough gain.

No one writing these unsafe blocks has the full context of the language in their head and runs through a checklist of "does every single guarantee Rust provides stay upheld?" as they either write unsafe or write a comment. You say it's good to know what assumptions the libs team is making, and I can say from my perspective I don't even know what assumptions I'm making when writing unsafe. It's not like we're trying to nefariously write code behind the lang team's back that is a gotcha for whatever unsafe guidelines are proposed. In reality we're just trying to get things done and most of this code is years old and probably didn't have a better alternative at the time, but everyone will basically always welcome centralization of unsafe to have fewer and fewer unsafe blocks.

@rfcbot
Copy link

rfcbot commented Sep 10, 2019

Team member @alexcrichton has proposed to close this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. labels Sep 10, 2019
@Centril
Copy link
Contributor

Centril commented Sep 10, 2019

@alexcrichton

@Centril I personally think that annoying lints can be actively harmful because they encourage the wrong coding patterns [...]

Do you think documenting why an unsafe { ... } block is correct is the wrong coding pattern? I think that's something great and which should be encouraged.

[...] and give a false sense of security or correctness.

This feels to me like "because we cannot have perfect security or correctness we should not try". I'd like to point to @RalfJung's comments 1 and 2 which feel relevant.

An unsafe block is no more correct if a comment is indicated as to why it's ok, nor is it even any more correct if the comment is quite long and tries to explain everything.

Comments are not intended as a proof of correctness. However, they can at least be read by humans during review and after the fact. I believe one can use good judgement in the length one gives to unsafe comments. No one has suggested that one should "explain everything".

Fundamentally unsafe blocks will be mistakes eventually and while we can try to head it off I see this as generating too much work for not enough gain.

I can understand that it might be a non-trivial amount for you in libstd's operating system related code. However, in key parts like libcore and liballoc I think it's well worth the time investment.

No one writing these unsafe blocks has the full context of the language in their head and runs through a checklist of "does every single guarantee Rust provides stay upheld?" as they either write unsafe or write a comment.

No one has suggested that you read through the entire nomicon each time you write unsafe { ... } but there are often some specific things to think through depending on what parts of the language you are using.

You say it's good to know what assumptions the libs team is making, [...]. It's not like we're trying to nefariously write code behind the lang team's back that is a gotcha for whatever unsafe guidelines are proposed.

I did not say "what assumptions the libs team is making". I think most of the code added to the standard library are in fact added by people outside the libs team. I referred to assumptions of the standard library as a whole. This is useful information when doing future language design or just for internal consistency of across the standard library.

[...], and I can say from my perspective I don't even know what assumptions I'm making when writing unsafe. [...]

Perhaps we interpret "assumptions" differently, but I would hope that you at least consider what invariants you must uphold in an unsafe { ... } block. If not, that does not inspire confidence.

In reality we're just trying to get things done and most of this code is years old and probably didn't have a better alternative at the time, but everyone will basically always welcome centralization of unsafe to have fewer and fewer unsafe blocks.

"Getting shit done"-ism is not an excuse nor in opposition to being deliberate and careful with unsafe { ... }. Nor is adding comments in opposition to reducing the amount of unsafe blocks through refactoring (who is against that?). I am not suggesting that we do everything over night, but we can improve and audit the years-old code over a longer period of time.

@sfackler
Copy link
Member

I think that's something great and which should be encouraged.

This does not encourage documentation of the safety of unsafe blocks, it mandates it.

@Centril
Copy link
Contributor

Centril commented Sep 10, 2019

@sfackler

This does not encourage documentation of the safety of unsafe blocks, it mandates it.

Strictly speaking, tidy has a way to opt-out per file.

Overall it is true however that it mandates documentation, but I think that's a good idea as well. Otherwise, people will oftentimes not do it when it is necessary and I think that's a problem.

@alexcrichton
Copy link
Member

I don't think you'll find anyone on any Rust team who actively says unsafe blocks should not be documented, and it seems somewhat disingenuous to think that's where I'm coming from? I am specifically objecting to what @sfackler mentioned, the mandate and requirement to document all unsafe blocks.

The amount of work to document unsafe blocks in libcore/liballoc is going to be the same whether or not we have this lint, so if that's the goal of the work why not proceed with that without the lint? I've got what I believe are legitimate concerns about the lint actively hindering and causing problems in some areas, but it's not like I'm trying to say "remove all the documentation from unsafe blocks".

@Centril
Copy link
Contributor

Centril commented Sep 11, 2019

@alexcrichton

I don't think you'll find anyone on any Rust team who actively says unsafe blocks should not be documented, [...]

I hope you are right but I think that's a low bar to set.

[...], and it seems somewhat disingenuous to think that's where I'm coming from?

Elaboration re. what you mean by "wrong coding patterns" would be helpful then as your argument largely felt like "this is security theater".

The amount of work to document unsafe blocks in libcore/liballoc is going to be the same whether or not we have this lint, so if that's the goal of the work why not proceed with that without the lint?

Because once we have done so, without lints, the documentation coverage would regress over time. If tidy yells at you, even if we have a local override like // ignore-tidy-safety on an unsafe { ... } block, at least the developer will be nudged in the direction of providing the comment. In my view, as unsafe { ... } is a dangerous thing to do (I mean... it says so on the tin...), it is well justified to lint in this case.

I've got what I believe are legitimate concerns about the lint actively hindering and causing problems in some areas, [...]

I think those concerns have not been fleshed out. I replied re. the specifics of alignment and whatnot and would appreciate fewer generalities in return. Let's be specific and concrete about what those areas are where problems and hindrances occur (like... are there specific file paths you'd not like to lint, like in the OS / FFI code?). Perhaps we can find a middle ground.

@sfackler
Copy link
Member

After looking through the diff here, 90+% of them look like "SAFETY: we performed the check that's one line above this unsafe block", or "SAFETY: it's okay to do this". These read to me like line noise - they add no real additional information to the context. It reminds me of Javadoc requirements in some of my college homework assignments where you end up with hundreds of lines of @param people The people.

Every PR is code reviewed by another person. It's their job to make sure that the code is sufficiently comprehensible. Sometimes there are tricky invariants or behavior that require a couple of lines of explanation in comments. Sometimes those happen to be oriented around unsafe blocks, but they often aren't.

If someone comes along and is confused by some piece of code, that may result in someone making a PR adding some extra commentary on it, but again, that doesn't have anything to do with what kind of expression that code is.

@alexcrichton
Copy link
Member

I hadn't articulated that before, but yeah this sort of feels like "safety theater" where we're trying to give ourselves a sense of security or a sense of "oh here's everything we have to account for", when in fact a strict lint such as this is largely only going to manufacture comments like @sfackler was mentioning. I, too, have seen endless instances of @param people The people and would prefer to keep that out of the standard library. I concretely mentioned this above.

I don't feel this is the time to get sort of more specific than that, I'm not trying to critize @oli-obk's work here. I'm criticizing the idea that we should have this lint. I think the appropriate way to go about this (if we were to go about this) is to start down the road of documenting unsafe blocks and reviewing those sorts of PRs. If 90% of unsafe blocks are trivial and don't need comments, this lint is probably over the top. If 90% of the comments are on the module instead of unsafe blocks, then literally this PR probably isn't what we want but maybe something similar to it. If only 5% of unsafe blocks end up not needing meaningful documentation, then this sounds like a good lint to have.

I also very much agree with @sfackler that the unsafe block itself is not always where documentation should live. For example std::sync::mpsc is riddled with unsafe but has a large explainer so readers can try to get a rough idea what's going on. I don't think it's really that helpful to have // SAFETY: see the module docs all over the place, nor is it really sufficient because you need to undertand all the safe code as well. This applies to abstractions like Vec as well, we ideally need // SAFETY whenever we modify self.len, but in general you just have to understand the whole module and its algorithms.

I hope you are right but I think that's a low bar to set.

FWIW this still feels very disingenuous, "I hope you are right" again seems like it's attempting to cast me, personally, in this ominous light of "oh look at this guy who refuses to document things and doesn't want anyone else to document anything". I get the impression you really think we should actively not document unsafe code, which gives me the impression that you're not really fully reading what I'm saying and trying to understand it.

@oli-obk
Copy link
Contributor Author

oli-obk commented Sep 12, 2019

I wrote this lint with the impression that most unsafe blocks can have meaningful docs. Then I used it and as obvious, that is not the case. It can be partially resolved by expecting either the function docs or the module docs (of any of the surrounding modules) contain an explanatory safety comment. This would solve the "read the docs" comments that I added. Since this scheme allows incrementally expanding its scope, I was hoping to iterate on it to end up at a good scheme.

Since very few unsafe blocks are documented so far, I'd be fine with first starting to do this without a lint and then revisiting once we're nearing full documentation.

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @oli-obk.

I read all the "SAFETY:" comments added in this PR and there aren't any that I would mind having in the codebase. I would be happy with merging this.


I find this comment from @Centril compelling:

It's also not the trivial cases that we wish to catch with these comments. [...] I think the cost of a bit of boilerplate in trivial cases is well worth it if we get to have commentary on more complicated cases. One might say that this problem should be solved with good reviews. To me however, it is clear that reviews have not been satisfactory thus far.


In response to @alexcrichton's and @sfackler's similar concerns,

Most of the safety comments I'm seeing sort of just trivially reference the line above it so I'm not really sure how a comment actually bolsters understanding of the function because it's all being read anyway.

After looking through the diff here, 90+% of them look like "SAFETY: we performed the check that's one line above this unsafe block", or "SAFETY: it's okay to do this". These read to me like line noise - they add no real additional information to the context.

I understand this reaction but it's meaningful to call out when the check on the previous line is the only consideration in an unsafe block being okay, as opposed to some other ambient invariant that comes in from elsewhere. We could reserve comments for only those unsafe blocks with a requirement more complicated than the previous line, but when touching nearby code it's a big help to be told when the safety reasoning is all just local.

@bors
Copy link
Contributor

bors commented Sep 17, 2019

☔ The latest upstream changes (presumably #64535) made this pull request unmergeable. Please resolve the merge conflicts.

@RalfJung
Copy link
Member

RalfJung commented Sep 18, 2019

I am fully in favor of this PR! unsafe operations are marked that way because they have extra preconditions, and an unsafe block is an indication by the programmer to the compiler "yes I checked those preconditions". It only seems reasonable to ask for a brief comment (a) showing that this check really happened, and (b) helping others to follow the argument and audit that the check was done correctly.

I am sure @Shnatsel's safety dance would benefit a lot from such conventions being more widely applied in the ecosystem.

Most of the safety comments I'm seeing sort of just trivially reference the line above it so I'm not really sure how a comment actually bolsters understanding of the function because it's all being read anyway.

I think it is extremely informative that the author of an unsafe block thought that this unsafe block is justified by just locally checking the line(s) above. That makes review just so much easier: if, after a local check, I am not convinced that this unsafe block is correct, I don't need to start a complex analysis of the invariants in this file (because maybe the block actually is correct but the argument is more complex). I know immediately that the author of the code made a mistake, saving a lot of time.

The comment saying "correct because we just checked X" takes very little time to write, and documents an important piece of information that makes audits a lot simpler. I think that's a good comment.

There's also other comments like "SAFETY: transmuting a sequence of u8 to u64 is always fine" which I feel like taken in isolation isn't actually right, in theory that requires some form of alignment checks as well but in reality the function being called here is doing all that internally (it's just the comment that's a bit misleading).

So the correctness argument is more subtle than @oli-obk thought; all the more reason to write it down, I'd say.

No one writing these unsafe blocks has the full context of the language in their head and runs through a checklist of "does every single guarantee Rust provides stay upheld?" as they either write unsafe or write a comment. You say it's good to know what assumptions the libs team is making, and I can say from my perspective I don't even know what assumptions I'm making when writing unsafe.

I assume that you have some argument in your head for why this unsafe block will not cause UB. Sure, the argument does not have to be a full-blown Coq proof, but my expectation is that people don't just write unsafe and hope that it'll work out.

Given that, all we ask is that this -- informal, possibly incomplete -- argument be written down. The argument has already been made by the programmer to themselves; putting it down in writing (a) helps them clarify their own thinking, and (b) helps others by not making them re-do the same argument again from scratch.

It's not like we're trying to nefariously write code behind the lang team's back that is a gotcha for whatever unsafe guidelines are proposed. In reality we're just trying to get things done and most of this code is years old and probably didn't have a better alternative at the time, but everyone will basically always welcome centralization of unsafe to have fewer and fewer unsafe blocks.

Nobody is assuming bad intentions here. But how are we supposed to figure out guidelines if we do not even know the kind of reasoning y'all are using to justify the use of unsafe operations to yourself? The PR does not ask that you quote a UCG document for every piece of reasoning, it asks that you give a brief hint for why you think this is or should be correct.

So, beyond the benefits listed above, such comments actually are an enormously useful resource to the UCG as they document what some very experience Rust programmers think are good rules for unsafe. If that does not align with what we think, we'd like to know so that we can start a discussion. But if your thinking is never written down, we won't notice the mismatch. Doing language design "in the dark", without real-world input, is really hard; such comments would go a long way to help us ground our efforts and make us aware of common and important unsafe idioms.

As @Centril said, there might also be code where this is less useful, such as code interacting closely with OS/FFI interfaces. Let's not lint on that code and instead find areas where it can be more useful. But by sheer volume, that is a minority of the code in this repo, so we have lots of other code that could benefit a lot from such a lint.

This does not encourage documentation of the safety of unsafe blocks, it mandates it.

The amount of work to document unsafe blocks in libcore/liballoc is going to be the same whether or not we have this lint, so if that's the goal of the work why not proceed with that without the lint?

Rust has a history of using tools to guide our process, as experience shows that a policy not checked by a tool is most likely not going to be followed very well. We have a large and diverse set of PR authors and reviewers, and at least I am not aware of a "checklist" that reviewers are supposed to follow. So, basically, any repo-wide effort not guarded by a tool is unlikely to actually work. This went far enough that when we had PRs that did some repo-wide language editing (full stops and capitalization and things like that), the general response was "those PRs are not worth it unless we have a tool that checks that these mistakes do not creep back in". I would say the same argument applies here: trying to document unsafe blocks is not worth it unless we have a tool that makes sure no new undocumented unsafe blocks can creep back in.

Personally I feel like some checks tidy is doing (such as line lengths or number of lines or banning TODO) are mostly annoying, and are fine to be encouraged but not good to be mandated. But I appreciate that the only way to actually be consistent about such things is to have a tool check them, so I dutifully break my lines and replace TODO by FIXME (or remove it) every time tidy calls me out.

If we had a "reviewer checklist", a good first step might be to add a requirement of documentation to that checklist. But we don't, or rather, our checklist is called tidy.

You are saying you are fine to encourage comments on unsafe blocks, but how, concretely, do you suggest we actually do that in a way that is effective? The way I view it, tidy is how we encourage things in this repo.

After looking through the diff here, 90+% of them look like "SAFETY: we performed the check that's one line above this unsafe block", or "SAFETY: it's okay to do this". These read to me like line noise - they add no real additional information to the context.

@Centril and me have argued above for why that does add information. It severely limits the search space of someone auditing the code. It is also very helpful for the UCG to know that most unsafe blocks are "trivial"/"local". In fact, I'd even be in favor of having a special keyword for this case so that we can have statistics on this, but that might be asking too much.

If 90% of unsafe blocks are trivial and don't need comments, this lint is probably over the top.

Disagreed, for the reasons mentioned above.

If 90% of the comments are on the module instead of unsafe blocks, then literally this PR probably isn't what we want but maybe something similar to it.

Then the unsafe blocks could still use a marker either referencing a module-level invariant or not.

The key important property of unsafe in Rust is that every bit of UB in a Rust program is related to some unsafe block somewhere. Sure, it is not always the unsafe block's fault (safe code can break invariants it relied upon), but safe code alone can never be at fault either. There's always a connection, through some invariant or so, to some unsafe code. That makes unsafe code a great place to start a code review: unsafe blocks are like the seed crystals from where the analysis begins; sometimes the analysis can be completed locally because the argument is simple, sometimes the analysis has to grow into surrounding safe code and encompass most of the module; but it is always correct to start from the unsafe blocks and then grow from there.

In that process, the hardest part is figuring out just how much information you need to argue for some unsafe block's correctness. Even a brief comment saying "local checks" vs "module-level invariant" goes a long way to declare the scope that needs to be considered here, making unsafe-centric review so much simpler.

@Shnatsel (and maybe others from safety-dance), you have done quite a bit of review of other people's unsafe code; your input in this matter would be much appreciated.

(I wrote this as I am catching up in this thread, and now realize many of these points have already been raised by @Centril. So I guess what I am saying is, I am in full agreement, and I think this lint would benefit both the quality of code in libcore, would greatly simplify unsafe-centric code audits, and would provide tons of useful input to the UCG and lang team.)

@Centril
Copy link
Contributor

Centril commented Sep 18, 2019

Thanks @RalfJung & @dtolnay; excellent points!

I'll make two further ones which I believe have not been raised prior:

  1. The standard library code has special requirements and privileges. As such, it is unlikely to be idiomatic Rust. Nonetheless, people do read the standard library code from time to time and take inspiration from it. This gives us an opportunity to encourage the documentation of unsafe in the wider crates.io ecosystem.

  2. This may be of use for certification efforts for code where MISRA C would be otherwise be used. -- @jamesmunns may be interested?

@oli-obk oli-obk added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 6, 2019
@oli-obk
Copy link
Contributor Author

oli-obk commented Nov 6, 2019

This is ready for review again

@Centril
Copy link
Contributor

Centril commented Nov 6, 2019

r? @dtolnay

@rust-highfive rust-highfive assigned dtolnay and unassigned KodrAus Nov 6, 2019
@dtolnay
Copy link
Member

dtolnay commented Nov 7, 2019

Thank you @oli-obk, this is great.

@bors r+

@bors
Copy link
Contributor

bors commented Nov 7, 2019

📌 Commit e28287b has been approved by dtolnay

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 7, 2019
@bors
Copy link
Contributor

bors commented Nov 7, 2019

⌛ Testing commit e28287b with merge d6d3560f88327c70c2d8a2244e2b7abc668bbd75...

Centril added a commit to Centril/rust that referenced this pull request Nov 7, 2019
Have tidy ensure that we document all `unsafe` blocks in libcore

cc @rust-lang/libs

I documented a few and added ignore flags on the other files. We can incrementally document the files, but won't regress any files this way.
@Centril
Copy link
Contributor

Centril commented Nov 7, 2019

@bors retry rolled up.

@bors
Copy link
Contributor

bors commented Nov 7, 2019

⌛ Testing commit e28287b with merge 35386f2dc586e83ed4d4a4d07cd325014f09232c...

Centril added a commit to Centril/rust that referenced this pull request Nov 7, 2019
Have tidy ensure that we document all `unsafe` blocks in libcore

cc @rust-lang/libs

I documented a few and added ignore flags on the other files. We can incrementally document the files, but won't regress any files this way.
@Centril
Copy link
Contributor

Centril commented Nov 7, 2019

@bors retry rolled up.

bors added a commit that referenced this pull request Nov 7, 2019
Rollup of 5 pull requests

Successful merges:

 - #63793 (Have tidy ensure that we document all `unsafe` blocks in libcore)
 - #64696 ([rustdoc] add sub settings)
 - #65916 (syntax: move stuff around)
 - #66087 (Update some build-pass ui tests to use check-pass where applicable)
 - #66182 (invalid_value lint: fix help text)

Failed merges:

r? @ghost
bors added a commit that referenced this pull request Nov 7, 2019
Rollup of 5 pull requests

Successful merges:

 - #63793 (Have tidy ensure that we document all `unsafe` blocks in libcore)
 - #64696 ([rustdoc] add sub settings)
 - #65916 (syntax: move stuff around)
 - #66087 (Update some build-pass ui tests to use check-pass where applicable)
 - #66182 (invalid_value lint: fix help text)

Failed merges:

r? @ghost
@bors
Copy link
Contributor

bors commented Nov 8, 2019

⌛ Testing commit e28287b with merge e8f43b7...

@bors bors merged commit e28287b into rust-lang:master Nov 8, 2019
@oli-obk oli-obk deleted the 🧹 branch November 8, 2019 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.